Information processing method and information processing apparatus

ABSTRACT

The present disclosure relates to an information processing method and an information processing apparatus. The information processing method according to the present disclosure performs training on a classification model by using a plurality of training samples, and comprises the steps of: adjusting a distribution of feature vectors of the plurality of training samples in a feature space based on a typical sample in the plurality of training samples; and performing training on the classification model by using the adjusted feature vectors of the plurality of training samples. Through the technology according to the present disclosure, it is possible to perform pre-adjustment on training samples before training, such that it is possible to reduce discrimination between training samples belonging to a same class and increase discrimination between training samples belonging to different classes in the training process. The classification model trained as such is capable of performing accurate classification on samples acquired under an extreme condition.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit of Chinese PatentApplication No. 201810662632.2, filed on Jun. 25, 2018 in the ChinaNational Intellectual Property Administration, the disclosure of whichis incorporated herein in its entirety by reference.

FIELD OF THE INVENTION

Embodiments of the present disclosure relate to an informationprocessing method and an information processing apparatus. Inparticular, embodiments of the present disclosure relate to aninformation processing method and an information processing apparatuswhich perform training on a classification model by using a plurality oftraining samples.

BACKGROUND

The development of a depth learning method of Convolutional NeuralNetworks (CNNs) and the construction of a large-scale database with alarge number of labeled face images make the performance of facerecognition greatly improved. For face images acquired under an extremecondition where there are great changes in aspects such as angle ofview, resolution, occlusion, image quality and so on, however, facerecognition based on convolutional neural networks still cannot achievea relatively high accuracy.

Softmax function, as a classification model, has been widely applied inconvolutional neural networks. In this case, Softmax loss function isused for training of convolutional neural networks. However, aconvolutional neural network trained using the current Softmax lossfunction is only adapted for recognizing face images with high-qualitydata, but cannot achieve a satisfactory effect for recognition of faceimages acquired under an extreme condition.

Therefore, it is necessary to improve the existing Softmax lossfunction, so as to make it possible to perform accurate recognition onface images acquired under an extreme condition.

SUMMARY OF THE INVENTION

A brief summary of the present disclosure is given below to provide abasic understanding of some aspects of the present disclosure. It shouldbe understood that the summary is not exhaustive; it does not intend todefine a key or important part of the present disclosure, nor does itintend to limit the scope of the present disclosure. The object of thesummary is only to briefly present some concepts, which serves as apreamble of the detailed description that follows.

An object of the present disclosure is to provide an informationprocessing method and an information processing apparatus. By theinformation processing method and the information processing apparatusaccording to the present disclosure, training is performed on aclassification model by using a plurality of labeled training samples,so as to obtain a classification model capable of performing accurateclassification on samples acquired under an extreme condition.

To achieve the object of the present disclosure, according to an aspectof the present disclosure, there is provided an information processingmethod, which performs training on a classification model by using aplurality of training samples, the method comprising: adjusting adistribution of feature vectors of the plurality of training samples ina feature space based on a typical sample in the plurality of trainingsamples; and performing training on the classification model by usingthe adjusted feature vectors of the plurality of training samples.

According to another aspect of the present disclosure, there is providedan information processing apparatus, which performs training on aclassification model by using a plurality of training samples, theapparatus comprising: an adjusting unit to adjust a distribution offeature vectors of the plurality of training samples in a feature spacebased on a typical sample in the plurality of training samples; and alearning unit to perform training on the classification model by usingthe adjusted feature vectors of the plurality of training samples.

According to still another aspect of the present disclosure, there isprovided an information processing method, which comprises detectingdata to be detected, by using a classification model obtained byperforming training by the information processing methods according tothe above aspects of the present disclosure.

According to yet another aspect of the present disclosure, there isfurther provided a computer program capable of implementing the aboveinformation processing methods. Moreover, there is further provided acomputer program product in at least computer readable medium form,which has recorded thereon computer program code for implementing theabove information processing methods.

By performing training on a classification model by using a plurality oftraining samples through the technology according to the presentdisclosure, it is possible to realize an improvement in theclassification model without significantly increasing calculation costs.In comparison with traditional classification models, it is possible toperform accurate classification on samples acquired under an extremecondition, through the classification model trained by the informationaccording method according to the present disclosure. That is, thetechnology according to the present disclosure can guide a model tolearn training samples with relatively high discrimination.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and advantages of the presentdisclosure will be understood more easily with reference to thedescriptions of embodiments of the present disclosure combined with thedrawings below. In the drawings:

FIG. 1 is a flowchart showing an information processing method accordingto a first embodiment of the present disclosure;

FIG. 2A and FIG. 2B are schematic views showing examples of taking faceimages as training samples;

FIG. 3 is a flowchart showing an information processing method accordingto a second embodiment of the present disclosure;

FIG. 4A, FIG. 4B, FIG. 4C and FIG. 4D are schematic views showinggeometric interpretations of respective steps of the informationprocessing method according to the second embodiment of the presentdisclosure;

FIG. 5 is a block diagram showing an information processing apparatusaccording to an embodiment of the present disclosure; and

FIG. 6 is a structure diagram of a general-purpose machine 600 that canbe used to realize the information processing methods 100, 300 and theinformation processing apparatus 500 which perform training on aclassification model by using a plurality of training samples accordingto the embodiments of the present disclosure.

DETAILED DESCRIPTION OF THE INVENTION

Hereinafter, some embodiments of the present disclosure will bedescribed in detail with reference to the appended illustrativediagrams. In denoting elements in figures by reference signs, identicalelements will be denoted by identical reference signs although they areshown in different figures. Moreover, in the descriptions of the presentdisclosure below, detailed descriptions of known functions andconfigurations incorporated into the present disclosure will be omittedwhile possibly making the subject matter of the present disclosureunclear.

It should also be noted herein that, to avoid the present disclosurefrom being obscured due to unnecessary details, only those devicestructures and/or processing steps closely related to the solutionaccording to the present disclosure are shown in the drawings, whileomitting other details not closely related to the present disclosure.

Herein, although embodiments of the present disclosure are describedunder the background of applying Softmax function as a classificationmodel to Convolutional Neural Networks (CNNs) to perform facerecognition, the present disclosure is not limited to this. Under theteaching of the present disclosure, those skilled in the art couldenvisage expanding the inventive idea of the present disclosure to otherclassification models (such as Sigmoid function and Tan h function) andother application fields (such as speech recognition), and all of thesevariant solutions should be covered within the scope of the presentdisclosure.

As a classification model, Softmax function may be understood as acombination of a (max) function taking a maximum value from among aplurality of values with a probability of each value of the plurality ofvalues to be taken as a maximum value. Softmax function, as anactivation function, has been widely applied in various artificialneural networks.

A convolutional neural network is a feedforward artificial neuralnetwork, and has been widely applied to the field of image and speechprocessing. The convolutional neural network is based on three importantfeatures, i.e., receptive field, weight sharing, and pooling.

The convolutional neural network assumes that each neuron has aconnection relationship with only neurons in an adjacent area and theyproduce influence upon each other. The receptive field represents a sizeof the adjacent area. In addition, the convolutional neural networkassumes that connection weights between neurons in a certain area mayalso be applied to other areas, namely weight sharing. The pooling ofthe convolutional neural network refers to a dimension reductionoperation performed based on aggregation statistics when theconvolutional neural network is used for solving the problem ofclassification.

Softmax function is used for mapping an output of the convolutionalneural network to an interval [0, 1], to represent a probability ofinput data to belong to a corresponding class, and thus is regarded as aclassification model.

In a training process of the convolutional neural network, it isnecessary to calculate a difference between a forward propagation resultof the convolutional neural network and a result calculated with labeledtraining samples, and to use the difference as a loss to perform backpropagation of the convolutional neural network, so as to performtraining on the convolutional neural network. In a case where Softmaxfunction is used in a pooling operation, Softmax loss function is usedfor performing learning of weights of the convolutional neural network.

To be specific, Softmax loss function is in the form as shown by thefollowing equation (1).

$\begin{matrix}{L_{{soft}\mspace{14mu} \max} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {\log\left( \frac{e^{{W_{y_{i}}^{T}x_{i}} + b_{y_{i}}}}{\sum\limits_{j = 1}^{C}\; e^{{W_{j}^{T}x_{i}} + b_{j}}} \right)}}}} & {{Equation}\mspace{14mu} (1)}\end{matrix}$

L_(softmax) represents a loss of Softmax function, which is defined as across entropy. N represents the number of characterized training samplesx_(i) (1≤i≤N), and C represents the count of classes. Note that, theexpression “training sample” in the present disclosure refers to asample used to perform training on a classification model, i.e., alabeled sample; for example, training samples x_(i) are labeled asy_(i). Herein, the characterized training samples x_(i) areM-dimensional vectors, and are labeled as y_(i) which is a certain classof the C classes.

W and b represent a C*M-dimensional weight matrix and a C-dimensionalbias vector of the convolutional neural network, respectively. W_(j)(1≤j≤C) represents a weight vector corresponding to a j-th class of theC classes in the weight matrix, and may be understood as parameterscorresponding to the j-th class which serve as M-dimensional vectors.

The following equation (2) could be obtained by performing furthertransformation on the equation (1).

$\begin{matrix}{L_{{soft}\mspace{14mu} \max} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {\log\left( \frac{e^{{{W_{y_{i}}^{T}}{x_{i}}{\cos {(\theta_{y_{i},i})}}} + b_{y_{i}}}}{\sum\limits_{j = 1}^{C}\; e^{{{W_{j}^{T}}{x_{i}}{\cos {(\theta_{j,i})}}} + b_{j}}} \right)}}}} & {{Equation}\mspace{14mu} (2)}\end{matrix}$

Where ∥W_(y) _(i) ^(T)∥ and ∥W_(j) ^(T)∥ represent moduli ofM-dimensional weight vectors, ∥x_(i)∥ represents a modulus of anM-dimensional training sample vector, θ_(j,i) represents an includedangle between the weight vector W_(j) and the training sample vectorx_(i) in a vector space, where 0≤θ_(j,i)≤π.

Softmax function and its loss function L_(softmax) are well-known tothose skilled in the art, and thus no description thereof will be madein more details. However, those skilled in the art should appreciatethat, although the present disclosure describes embodiments of thepresent disclosure based on Softmax function and its loss functionL_(softmax), the idea of the present disclosure may also be applied toother classification models, and may be applied to other artificialneural networks other than convolutional neural networks, such asRecurrent Neural Networks (RNNs), Deep Neural Networks (DNNs) and so on.

However, the existing network models obtained by performing learningusing Softmax loss function L_(softmax) cannot achieve a satisfactoryeffect for training samples with relatively high learningdiscrimination, and thus it is necessary to improve loss functionL_(softmax) to guide a network model to perform learning, such that adistance between training samples belonging to a same class in thevector space is reduced and such that a distance between trainingsamples belonging to different classes in the vector space is increased.

Therefore, the present disclosure proposes an information processingtechnology for performing training on a classification model by using aplurality of training samples. The technology according to the presentdisclosure first performs preprocessing on the training samples beforeperforming the training on the training samples, so as to guide learningof the classification model, thereby achieving the technical effect ofreducing an intra-class distance and increasing an intra-class distance.

Embodiments of the present disclosure will be described in more detailscombined with the drawings below.

First Embodiment

FIG. 1 is a flowchart showing an information processing method 100according to a first embodiment of the present disclosure.

The information processing method 100 according to the first embodimentof the present disclosure performs training on a classification model byusing a plurality of training samples. As shown in FIG. 1, theinformation processing method 100 starts at step S101. Subsequently, instep S110, a distribution of feature vectors of the plurality oftraining samples in a feature space is adjusted based on a typicalsample in the plurality of training samples. Next, in step S120,training is performed on the classification model by using the adjustedfeature vectors of the plurality of training samples. Finally, theinformation processing method 100 ends at step S130.

The idea of embodiments of the present disclosure lies in adding aconstraint condition for training samples before training, so as toenhance discriminativity between training samples of different classes.

According to an embodiment of the present disclosure, the operation ofadjusting the distribution of the feature vectors in the feature spacein the step S110 is performed by: selecting a training sample having amost typical feature of a class from among training samples belonging toa same class among the plurality of training samples, as a typicalsample of the class; and causing feature vectors of other trainingsamples other than the typical sample to aggregate towards a featurevector of the typical sample. The purpose of the operation is reducingdiscrimination between training samples belonging to a same class andincreasing discrimination between training samples belonging todifferent classes, before performing the training on the classificationmodel by using the plurality of training samples.

According to an embodiment of the present disclosure, the most typicalfeature of the class causes the classification model not to classify thetypical sample into another class.

For example, when training samples are face images and are used forperforming training on a face recognition model serving as aclassification model, for face images belonging to a same person (i.e.,training samples belonging to a same class), a face image captured in astandard environment, for example under a condition where a colorcontrast of the background is obvious, illumination intensity is uniformand appropriate, a face right faces a camera lens and has no deviationand the like, may be taken as the typical sample. That is, the facerecognition model will not classify the face image as the typical sampleto belong to other persons.

FIG. 2A and FIG. 2B are schematic views showing examples of taking faceimages as training samples, wherein FIG. 2A shows a schematic view oftraining samples which have not been subjected to the processing in thestep S110, and FIG. 2B shows a schematic view of training samples whichhave been subjected to the processing in the step S110.

The abscissas and ordinates in FIG. 2A and FIG. 2B represent classes,respectively. In the cases as shown in FIG. 2A and FIG. 2B, face imagesare taken as training samples, and belong to different classes, that is,belong to different persons. As shown in FIG. 2A, a distance betweenface images (training samples) belonging to different persons (classes)is not great, and thus discrimination between them is not obvious. Inthis case, a face recognition model obtained through training using thetraining samples which have not been subjected to the processing cannotachieve an excellent classification effect with respect to a sampleobtained under an extreme case, for example in a case where a backgroundcolor is close to a face color, illumination rays are dark and a facedeviates from a camera lens and the like.

When the information processing method 100 according to the firstembodiment of the present disclosure is applied, through the processingin the step S110, i.e., by taking a front face image captured in a casewhere a color contrast of the background is obvious and an illuminationcondition is ideal as the typical sample of the class (person), andadjusting a distribution of feature vectors of other training samples(face images) in the feature space, the feature vectors of the othertraining samples are caused to aggregate towards a feature vector of thetypical sample, so as to reduce discrimination between training samplesbelonging to a same class and increase discrimination between trainingsamples belonging to different classes.

According to an embodiment of the present disclosure, the processing inthe step S110 may be implemented by: normalizing distances between thefeature vectors of the other training samples and the feature vector ofthe typical sample by taking the feature vector of the typical sample asa center.

The step S110 of the information processing method 100 according to thefirst embodiment of the present disclosure will be described incombination with loss function L_(softmax) below.

Based on the operation in the step S110 of the information processingmethod 100, the loss function L_(softmax) in the equation (2) could bemodified into the following equation (3).

$\begin{matrix}{L_{{soft}\mspace{14mu} \max} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {\log\left( \frac{e^{{{W_{y_{i}}^{T}}\beta \mspace{14mu} {\cos {(\theta_{y_{i},i})}}} + b_{y_{i}}}}{\sum\limits_{j = 1}^{C}\; e^{{{W_{j}^{T}}\beta \mspace{14mu} {\cos {(\theta_{j,i})}}} + b_{j}}} \right)}}}} & {{Equation}\mspace{14mu} (3)}\end{matrix}$

Training samples x_(i) belonging to a same class are normalized relativeto the typical sample, i.e., ∥x_(i)∥=β, where β is a constant. Throughsuch processing, in the feature space, it is possible to cause featurevectors of other training samples other than the typical sample toaggregate towards a feature vector of the typical sample, such thatdiscrimination between training samples belonging to a same class isreduced and discrimination between training samples belonging todifferent training samples is increased.

Subsequently, as stated above, in the step S120, the informationprocessing method 100 performs training on the classification model byusing the adjusted feature vectors of the plurality of training samples,so as to obtain a final trained classification model

By the information processing method 100, it is possible to performpre-adjustment on the training samples before the training, such that itis possible to reduce discrimination between training samples belongingto a same class and increase discrimination between training samplesbelonging to different classes in the training process. Theclassification model trained as such is capable of performing accurateclassification on samples acquired under an extreme condition.

Second Embodiment

According to the present disclosure, in addition to performingpreprocessing on training samples before training, it is also possibleto perform preprocessing on a classification model itself.

FIG. 3 is a flowchart showing an information processing method 300according to a second embodiment of the present disclosure.

As shown in FIG. 3, the information processing method 300 starts at stepS301. Subsequently, step S310 is performed. The step S310 in FIG. 3 iscompletely the same as the step S110 in FIG. 1, and thus no repeateddescription will be made in regard to this step for the sake ofconciseness.

Referring to FIG. 3, according to the second embodiment of the presentdisclosure, the information processing method 300 can further comprisestep S320, in which parameters of the classification model which arerelated to different classes are normalized, to perform training on theclassification model based on the adjusted feature vectors of theplurality of training samples and the normalized parameters.

As shown in the above equations (1) to (3), the weight vector W_(j) maybe understood as an axis starting from the origin in the vector space.Therefore, in the vector space, there exist C axes intersecting at theorigin, which respectively correspond to C classes and simultaneouslycorrespond to C weight vectors W_(j). Feature vectors of trainingsamples belonging to a same class aggregate near the correspondingweight vectors W_(j).

In other words, for each class, the classification model can haveparameters corresponding to the class, such as a weight vector W_(j) anda corresponding bias value b_(j).

By normalizing the weight vector W_(j), it is possible to project theweight vector W_(j) onto a same sphere in the vector space. Through theprocessing in the step S320, it is possible to eliminate influences of amodulus of the weight vector W_(j) upon the classification model, so asto obtain a more strict classification standard.

Based on the operation in the step S320, the loss function L_(softmax)in the equation (3) could be further modified into the followingequation (4).

$\begin{matrix}{L_{{soft}\mspace{14mu} \max} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {\log\left( \frac{e^{{{\alpha\beta}\mspace{14mu} {\cos {(\theta_{y_{i},i})}}} + b_{y_{i}}}}{\sum\limits_{j = 1}^{C}\; e^{{{\alpha\beta}\mspace{14mu} {\cos {(\theta_{j,i})}}} + b_{j}}} \right)}}}} & {{Equation}\mspace{14mu} (4)}\end{matrix}$

Where the weight vector W_(j) is normalized, that is, ∥W_(j)|=α, where αis a constant.

In addition, referring to FIG. 3, according to the second embodiment ofthe present disclosure, the information processing method 300 canfurther comprise step S330, in which discrimination margins betweendifferent classes to which the plurality of training samples belong areincreased, to perform training on the classification model based on theadjusted feature vectors of the plurality of training samples and theincreased discrimination margins.

The purpose of the processing in the step S330 is also reducingdiscrimination between training samples belonging to a same class andincreasing discrimination between training samples belonging todifferent classes.

To be specific, according to an embodiment of the present disclosure,the increasing discrimination margins may be performed by: for eachtraining sample in the plurality of the training samples, adjusting asimilarity degree between the training sample and a parameter of acorresponding class. As stated above, to reduce discrimination betweentraining samples belonging to a same class, it is possible to adjust asimilarity degree between the training sample and a parameter of acorresponding class, i.e., to adjust a similarity between the trainingsample vector x_(i) and the corresponding weight vector W_(yi).

To be more specific, according to an embodiment of the presentdisclosure, the adjusting a similarity degree can comprise: multiplyingan angle between a feature vector of the training sample and a featurevector of a corresponding parameter of the classification model by acoefficient m, where m>1. In other words, for the training sample vectorx_(i) and the corresponding weight vector W_(yi), it is possible toincrease a similarity degree therebetween by reducing an included anglebetween the two vectors.

Description will be made based on loss function L_(softmax) below.

Based on the operation in the step S330 described above, the lossfunction L_(softmax) in the equation (4) could be further modified intothe following equation (5).

$\begin{matrix}{{L_{{soft}\mspace{14mu} \max} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\; {\log\left( \frac{e^{{{\alpha\beta\phi}{({\theta_{y_{i}},i})}} + b_{y_{i}}}}{{\sum\limits_{j = 1}^{C}\; e^{{{\alpha\beta\phi}{({\theta_{y_{i}},i})}} + b_{j}}} + {\sum\limits_{j \neq y_{i}}^{C}\; e^{{{\alpha\beta}\mspace{14mu} {\cos {({\theta_{y_{i}},i})}}} + b_{j}}}} \right)}}}}{{{{Where}\mspace{14mu} {\phi \left( {\theta_{y_{i}},i} \right)}} - {\left( {- 1} \right)^{k}\mspace{14mu} {\cos \left( {m\; \theta_{y_{i},i}} \right)}} - {2k}},{\theta_{y_{i},i} \in \left\lbrack {\frac{k\; \pi}{m},\frac{\left( {k + 1} \right)\pi}{m}} \right\rbrack},{{{and}\mspace{14mu} k} \in \left\lbrack {0,{m - 1}} \right\rbrack},{m \geq 1.}}} & {{Equation}\mspace{14mu} (5)}\end{matrix}$

That is, by introducing m (≥1), it is possible to reduce discriminationbetween training samples belonging to a same class and increasediscrimination between training samples belonging to different classes.In other words, by introducing the coefficient m (≥1), it is possible toincrease discrimination margins between different classes to which theplurality of training samples belong.

The coefficient m may be selected based on various principles.

For example, according to an embodiment of the present disclosure, thecoefficient m may be selected such that a maximum angle feature distancewithin a same class is less than a minimum angle feature distancebetween different classes. That is, by introducing the coefficient m, amaximum value of included angles between feature vectors of all trainingsamples belonging to a same class in the vector space is less than aminimum value of included angles between feature vectors of trainingsamples belonging to different classes in the vector space, such thatdiscrimination of the training samples belonging to the same class isrelatively small and discrimination of the training samples belonging tothe different classes is relatively large.

Moreover, according to an embodiment of the present disclosure, thecoefficient m is reduced when the count C of the classes to which theplurality of training samples belong is increased; and the coefficient mis increased when the count C is reduced. As stated above, by increasingthe coefficient m, it is possible to reduce included angles betweenfeature vectors of training samples belonging to a same class in thevector space and to increase included angles between feature vectors oftraining samples belonging to different classes in the vector space.However, the coefficient m is related to the count C of the classes. Tobe specific, when the count C is relatively large, the value of thecoefficient m cannot be excessively increased since the distribution ofthe weight vectors W_(j) in the vector space is relatively dense. Forthe same reason, when the count is relatively small, the value of thecoefficient m can be appropriately increased since the distribution ofthe weight vectors W_(j) in the vector space is relatively sparse.

Subsequently, as shown in FIG. 3, in step S340, it is possible toperform training on the classification model based on the adjustedfeature vectors of the plurality of training samples, the normalizedparameters and the increased discrimination margins, to obtain a trainedclassification model. Finally, the information processing method 300ends at step S350.

The classification model trained by the information processing method300 is capable of performing accurate classification on samples acquiredunder an extreme condition.

Although the information processing method 300 has been described abovein the order of the steps S310, S320, S330, those skilled in the artshould appreciate that there exists no particular limitation to theexecution order of the above steps. In other words, the steps S310,S320, S330 may be executed in any order or may be simultaneouslyexecuted, and all of these variant solutions should be covered withinthe scope of the present disclosure. Moreover, those skilled in the artshould also appreciate that the steps S320 and S330 are not essentialfor the information processing method according to the embodiment of thepresent disclosure. In other words, it is possible to execute only thestep S310 but not execute the steps S320 and S330, or it is possible toexecute the step S310 and one of the steps S320 and S330.

To more intuitively describe the technology according to the presentdisclosure, interpretations of the respective steps of the informationprocessing method 300 according to the present disclosure will bedescribed in combination with FIG. 4A, FIG. 4B, FIG. 4C and FIG. 4D.

FIG. 4A, FIG. 4B, FIG. 4C and FIG. 4D are schematic views showinggeometric interpretations of the respective steps of the informationprocessing method 300 according to the second embodiment of the presentdisclosure

To be specific, FIG. 4A shows the case where no preprocessing isperformed on the classification model and the training samples. As shownin FIG. 4A, for example, the training samples belong to two classes(represented by circular points in light color and circular points indark color, respectively), projections of parameters such as weightvectors of the two classes in the vector space are W₁ and W₂, and adistribution of feature vectors of the training samples in the vectorspace is around the two weight vectors. To facilitate the understanding,FIG. 4A shows a boundary for deciding the classes.

FIG. 4B shows the case where preprocessing (the step S330) of increasinginter-class discrimination margins is performed on the classificationmodel. As shown in FIG. 4B, by executing the step S330, the inter-classdecision boundary is extended from a line into a sector and the trainingsamples of the respective classes aggregate (represented by arrows inlight color in the figure) towards the corresponding weight vectors,such that discrimination between training samples belonging to a sameclass is relatively small and discrimination between training samplesbelonging to different classes is relatively large.

FIG. 4C shows the case where preprocessing (the step S320) of classparameter normalization is further performed on the classificationmodel. As shown in FIG. 4C, by executing the step S320, the weightvectors W₁ and W₂ are normalized and thus are capable of being projectedonto the same sphere in the vector space, thereby eliminating theinfluences of the moduli of the weight vectors upon the training processof the classification model, so as to obtain a more strictclassification standard.

FIG. 4D shows the case where preprocessing (the step S110 or S310) ofclass parameter normalization is further performed on the classificationmodel. As shown in FIG. 4C, by executing the step S110 or S310, trainingsamples belonging to different classes aggregate (represented by arrowsin light color in the figure) towards the corresponding typical samplesrespectively, such that discrimination between training samplesbelonging to a same class is relatively smaller and discriminationbetween training samples belonging to different classes is relativelylarger.

The information processing method according to the present disclosurecan perform pre-adjustment on the training samples and theclassification model before training, such that it is possible to reducediscrimination between training samples belonging to a same class andincrease discrimination between training samples belonging to differentclasses in the training process. The classification model trained assuch is capable of performing accurate classification on samplesacquired under an extreme condition.

FIG. 5 is a block diagram showing an information processing apparatus500 according to an embodiment of the present disclosure.

As shown in FIG. 5, the information processing apparatus 500 forperforming training on a classification model by using a plurality oftraining samples comprises: an adjusting unit 501 to adjust adistribution of feature vectors of the plurality of training samples ina feature space based on a typical sample in the plurality of trainingsamples; and learning unit 502 to perform training on the classificationmodel by using the adjusted feature vectors of the plurality of trainingsamples.

The adjusting unit 501 is configured to perform the processing in thestep S110 of the method 100 described above with reference to FIG. 1 orthe step S310 of the method 300 described above with reference to FIG. 3and can gain the benefits related to the processing, and descriptionsthereof are omitted here.

The learning unit 502 is configured to perform the processing in thestep S120 of the method 100 described above with reference to FIG. 1 orthe step S340 of the method 300 described above with reference to FIG. 3and can gain the benefits related to the processing, and descriptionsthereof are omitted here.

In addition, the present disclosure further proposes an informationprocessing method, which detects data to be detected, by using aclassification model obtained by performing training by the informationprocessing methods as described above. By performing training by theinformation processing methods as described above, it is possible toacquire a classification model having a better classification effect forsamples acquired under an extreme condition, and to use theclassification model to perform classification on unlabeled samples(i.e., the data to be detected).

FIG. 6 is a structure diagram of a general-purpose machine 600 that canbe used to realize the information processing methods 100, 300 and theinformation processing apparatus 500 which perform training on aclassification model by using a plurality of training samples accordingto the embodiments of the present disclosure. It should be noted that,the general-purpose machine 600 is only an example, but does not imply alimitation to the use range or function of the methods and apparatus ofthe present disclosure. The general-purpose machine 600 should also notbe construed to have a dependency on or a demand for any assembly orcombination thereof as shown in the above methods and apparatus whichperform training on a classification model by using a plurality oftraining samples.

In FIG. 6, a central processing unit (CPU) 601 performs variousprocessing according to programs stored in a read-only memory (ROM) 602or programs loaded from a storage part 608 to a random access memory(RAM) 603. In the RAM 603, data needed when the CPU 601 performs variousprocesses and the like is also stored, as needed. The CPU 601, the ROM602 and the RAM 603 are connected to each other via a bus 604. Aninput/output interface 605 is also connected to the bus 604.

The following components are connected to the input/output interface605: an input part 606 (including keyboard, mouse and the like), anoutput part 607 (including display such as cathode ray tube (CRT),liquid crystal display (LCD) and the like, and loudspeaker and thelike), a storage part 608 (including hard disc and the like), and acommunication part 609 (including network interface card such as LANcard, modem and the like). The communication part 609 performscommunication processing via a network such as the Internet. A driver610 may also be connected to the input/output interface 605, as needed.As needed, a removable medium 611, such as a magnetic disc, an opticaldisc, a magnetic optical disc, a semiconductor memory and the like, maybe installed in the driver 610, such that a computer program readtherefrom is installed in the memory part 608 as needed.

In the case where the foregoing series of processing is implementedthrough software, programs constituting the software are installed froma network such as the Internet or a memory medium such as the removablemedium 611.

It should be understood by those skilled in the art that, such a memorymedium is not limited to the removable mediums 611 as shown in FIG. 6 inwhich programs are stored and which are distributed separatedly from theapparatus to provide the programs to users. Examples of the removablemedium 611 include a magnetic disc (including floppy disc (registeredtrademark)), a compact disc (including compact disc read-only memory(CD-ROM) and digital video disk (DVD)), a magnetic optical disc(including mini disc (MD) (registered trademark)), and a semiconductormemory. Alternatively, the memory mediums may be hard discs included inthe ROM 602 and the memory part 608, in which programs are stored andwhich are distributed together with the apparatus containing them tousers.

In addition, the present disclosure further proposes a program producthaving stored thereon machine-readable instruction codes that, when readand executed by a machine, can implement the above informationprocessing method according to the present disclosure which performstraining on a classification model by using a plurality of trainingsamples. Accordingly, the various storage media for carrying such aprogram product which are enumerated above are also included within thescope of the present disclosure.

Detailed descriptions have been made above by means of block diagrams,flowcharts and/or embodiments, to describe specific embodiments of theapparatus and/or methods according to the embodiments of the presentapplication. When these block diagrams, flowcharts and/or embodimentsinclude one or more functions and/or operations, those skilled in theart appreciate that the various functions and/or operations in theseblock diagrams, flowcharts and/or embodiments may be implementedindividually and/or jointly through various hardware, software, firmwareor essentially any combination thereof. In an embodiment, several partsof the subject matter described in the present specification may berealized by an Application Specific Integrated Circuit (ASIC), aField-Programmable Gate Array (FPGA), a Digital Signal Processor (DSP)or other integrations. However, those skilled in the art wouldappreciate that, some aspects of the embodiments described in thepresent specification may, completely or partially, be equivalentlyimplemented in the form of one or more computer programs running on oneor more computers (e.g., in the form of one or more computer programsrunning on one or more computer systems), in the form of one or morecomputer programs running on one or more processors (e.g., in the formof one or more computer programs running on one or moremicroprocessors), in the form of firmware, or in the form of essentiallyany combination thereof; moreover, according to the disclosure of thepresent specification, designing circuitry used for the presentdisclosure and/or compiling codes of software and/or firmware used forthe present application are completely within the range of thecapability of those skilled in the art.

It should be emphasized that the term “comprise/include”, as usedherein, refers to the presence of a feature, an element, a step or anassembly but does not preclude the presence or addition of one or moreother features, elements, steps or assemblies. The term “first”,“second” and the like relating to ordinal numbers does not represent animplementation sequence or importance degree of features, elements,steps or assemblies defined by these terms, but is only used to performidentification among these features, elements, steps or assemblies forthe sake of clarity of descriptions.

In conclusion, in the embodiments of the present disclosure, the presentdisclosure provides the following solutions but is not limit hereto:

Solution 1. An information processing method, which performs training ona classification model by using a plurality of training samples, themethod comprising:

adjusting a distribution of feature vectors of the plurality of trainingsamples in a feature space based on a typical sample in the plurality oftraining samples; and

performing training on the classification model by using the adjustedfeature vectors of the plurality of training samples.

Solution 2. The information processing method according to solution 1,wherein the typical sample is a training sample having a most typicalfeature of a class, the adjusting comprising:

causing feature vectors of other training samples other than the typicalsample to aggregate towards a feature vector of the typical sample.

Solution 3: The information processing method according to solution 2,wherein the most typical feature of the class causes the classificationmodel not to classify the typical sample into another class.

Solution 4. The information processing method according to solution 2,wherein the aggregating comprises:

normalizing distances between the feature vectors of the other trainingsamples and the feature vector of the typical sample by taking thefeature vector of the typical sample as a center.

Solution 5. The information processing method according to solution 1,wherein for each class, the classification model has a parametercorresponding to the class,

the method further comprising:

normalizing parameters of the classification model which are related todifferent classes, to perform training on the classification model basedon the adjusted feature vectors of the plurality of training samples andthe normalized parameters.

Solution 6. The information processing method according to solution 1,further comprising:

increasing discrimination margins between different classes to which theplurality of training samples belong, to perform training on theclassification model based on the adjusted feature vectors of theplurality of training samples and the increased discrimination margins.

Solution 7. The information processing method according to solution 6,wherein the increasing discrimination margins comprises:

for each training sample in the plurality of the training samples,adjusting a similarity degree between the training sample and aparameter of a corresponding class.

Solution 8. The information processing method according to solution 7,wherein the adjusting a similarity degree comprises: multiplying anangle between a feature vector of the training sample and a featurevector of a corresponding parameter of the classification model by acoefficient m, where m>1.

Solution 9. The information processing method according to solution 8,wherein the coefficient m is selected such that a maximum angle featuredistance within a same class is less than a minimum angle featuredistance between different classes.

Solution 10. The information processing method according to solution 8,wherein the coefficient m is reduced when the count of the classes towhich the plurality of training samples belong is increased; and thecoefficient m is increased when the count is reduced.

Solution 11. The information processing method according to solution 1,wherein the classification model is Softmax function, the parameters areweights of Softmax function with respect to different classes, thetraining samples are inputs used in a training process using Softmaxfunction.

Solution 12. An information processing apparatus, which performstraining on a classification model by using a plurality of trainingsamples, the apparatus comprising:

an adjusting unit to adjust a distribution of feature vectors of theplurality of training samples in a feature space based on a typicalsample in the plurality of training samples; and

a learning unit to perform training on the classification model by usingthe adjusted feature vectors of the plurality of training samples.

Solution 13. An information processing method, comprising detecting datato be detected, by using a classification model obtained by performingtraining by the information processing methods according to solutions 1to 11.

While the present disclosure has been described above with reference tothe descriptions of the specific embodiments of the present disclosure,it should be understood that those skilled in the art could carry outvarious modifications, improvements or equivalents on the presentdisclosure within the spirit and scope of the appended claims. Themodifications, improvements or equivalents should also be considered asbeing included in the scope of protection of the present disclosure.

1. An information processing method by a network of computer implementedprocesses to perform training on a classification model by using aplurality of training samples, the method comprising: adjusting adistribution of feature vectors of the plurality of training samples ina feature space based on a typical sample in the plurality of trainingsamples; and performing training on the classification model by usingthe adjusted feature vectors of the plurality of training samples. 2.The information processing method according to claim 1, wherein thetypical sample is a training sample having a most typical feature of aclass in the classification model, and the adjusting comprises: causingfeature vectors of other training samples other than the typical sampleto aggregate towards a feature vector of the typical sample.
 3. Theinformation processing method according to claim 2, wherein the mosttypical feature of the class causes the classification model not toclassify the typical sample into another class.
 4. The informationprocessing method according to claim 2, wherein to aggregate towards thetarget feature vector comprises: normalizing distances between thefeature vectors of the other training samples and the feature vector ofthe typical sample by taking the feature vector of the typical sample asa center.
 5. The information processing method according to claim 1,wherein for each class in the classification model, the classificationmodel has a parameter corresponding to the class, the method furthercomprising: normalizing parameters of the classification model which arerelated to different classes, to perform training on the classificationmodel based on the adjusted feature vectors of the plurality of trainingsamples and the normalized parameters.
 6. The information processingmethod according to claim 1, further comprising: increasingdiscrimination margins between different classes in the classificationmodel to which the plurality of training samples belong, to performtraining on the classification model based on the adjusted featurevectors of the plurality of training samples and the increaseddiscrimination margins.
 7. The information processing method accordingto claim 6, wherein the increasing discrimination margins comprises: foreach training sample in the plurality of the training samples, adjustinga similarity degree between the training sample and a parameter of acorresponding class.
 8. The information processing method according toclaim 7, wherein the adjusting a similarity degree comprises:multiplying an angle between a feature vector of the training sample anda feature vector of a corresponding parameter of the classificationmodel by a coefficient m, where m>1.
 9. An information processingapparatus, which performs training on a classification model by using aplurality of training samples, the apparatus comprising: a processorcoupled to a memory and to, adjust a distribution of feature vectors ofthe plurality of training samples in a feature space based on a typicalsample in the plurality of training samples; and perform training on theclassification model by using the adjusted feature vectors of theplurality of training samples.
 10. An information processing method by acomputer, comprising: detecting data to be detected, by using aclassification model obtained by performing training of theclassification model using a plurality of training samples by, adjustinga distribution of feature vectors of the plurality of training samplesin the feature space based on a typical sample in the plurality oftraining samples; and performing training on the classification model byusing the adjusted feature vectors of the plurality of training samples.11. The method according to claim 10, wherein the typical sample is atraining sample having a most typical feature of a class in theclassification model, and the adjusting comprises causing featurevectors of other training samples other than the typical sample toaggregate towards a feature vector of the typical sample.
 13. The methodaccording to claim 11, wherein the most typical feature of the classcauses the classification model not to classify the typical sample intoanother class.
 14. The method according to claim 11, wherein toaggregate towards the target feature vector comprises: normalizingdistances between the feature vectors of the other training samples andthe feature vector of the typical sample by taking the feature vector ofthe typical sample as a center.
 15. An information processing apparatuscomprising a processor coupled to a memory and to detect data to bedetected, by using a classification model obtained by performingtraining of the classification model using a plurality of trainingsamples in a feature space according to claim 1.